uTHCD: A New Benchmarking for Tamil Handwritten OCR

نویسندگان

چکیده

Handwritten character recognition is a challenging research in the field of document image analysis over many decades due to numerous reasons such as large writing styles variation, inherent noise data, expansive applications it offers, non-availability benchmark databases etc. There has been considerable work reported literature about creation database for several Indic scripts but Tamil script still its infancy only one [5]. In this paper, we present done an exhaustive and unconstrained Character Database (uTHCD). consists around 91000 samples with nearly 600 each 156 classes. The unified collection both online offline samples. Offline were collected by asking volunteers write on form inside specified grid. For samples, made similar grid using digital pad. encompass vast variety styles, distortions arising from scanning process viz stroke discontinuity, variable thickness stroke, distortion Algorithms which are resilient data can be practically deployed real time applications. generated 650 native including school going kids, homemakers, university students faculty. isolated will publicly available raw images Hierarchical Data File (HDF) compressed file. With database, expect set new handwritten serve launchpad avenues domain. Paper also presents ideal experimental set-up convolutional neural networks (CNN) baseline accuracy 88% test data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A complete OCR for printed Tamil text

A Neural Network approach is proposed to build an automatic off-line handwritten Tamil character recognition system. We have used a Back Propagation Network (BPN) as a character recognizer. Once trained, the network has a very fast response time. However, the learning phase of this recognizer is a relatively difficult task in this application. The input image of the handwritten character is giv...

متن کامل

OCR for Handwritten Kannada Language Script

The optical character recognition (OCR) is the process of converting textual scanned image into a computer editable format. The proposed OCR system is for complex handwritten Kannada characters. One of the major challenges faced by Kannada OCR system is recognition of handwritten text from an image. The input text image is subjected to preprocessing and then converted into binary image. Segment...

متن کامل

Handwritten Document Retrieval System for Tamil Language

The paper attempts to create a handwritten document retrieval system suitable for Tamil language, with a view to record traditional literature content for future reference. It projects a search mechanism to access the query word images using a statistical model based methodology. The scheme revolves around a well defined procedure which results in word models from where the search word can be r...

متن کامل

A Complete OCR System Development of Tamil Magazine Documents

We present an early version of a complete Optical Character Recognition (OCR) system for Tamil magazine documents. All the standard elements of OCR process like deskewing, preprocessing, segmentation, character recognition and reconstruction are implemented. Experience with OCR problems teaches that for most subtasks involved in OCR, there is no single technique that gives perfect results for e...

متن کامل

Beyond OCR: Handwritten Manuscript Attribute Understanding

Historical manuscript dating has always been an important challenge for historians but since countless manuscripts have become digitally available recently, the pattern recognition community has started addressing the dating problem as well. In this chapter, we present a family of local contour fragments (kCF) and stroke fragments (kSF) features and study their application to historical documen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3096823